Unsupervised Audio Scene Analysis
نویسنده
چکیده
Motivation: While many company’s are expending great effort in the field of automatic speech recognition(ASR), little attention is being paid to general audio and long-term modeling of audio in general. Even an ASR system which could give a complete transcription of the words heard in an environment would lack vital information. e.g., who was talking, when they were talking, what was the tone of the conversation, did someone slam the door, did someone use a harsh expletive, when do these conversations tend to happen, etc. In fact, this information is potentially of great value even without the transcription of what was said.
منابع مشابه
Traffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملSinging Voice Separation Using Spectro-Temporal Modulation Features
An auditory-perception inspired singing voice separation algorithm for monaural music recordings is proposed in this paper. Under the framework of computational auditory scene analysis (CASA), the music recordings are first transformed into auditory spectrograms. After extracting the spectral-temporal modulation contents of the timefrequency (T-F) units through a two-stage auditory model, we de...
متن کاملAnti-social Behavior Detection in Audio-Visual Surveillance Systems
In this paper we propose a general purpose framework for detection of unusual events. The proposed system is based on the unsupervised method for unusual scene detection in web–cam images that was introduced in [1]. We extend their algorithm to accommodate data from different modalities and introduce the concept of time-space blocks. In addition, we evaluate early and late fusion techniques for...
متن کاملNonnegative Tensor Factorization with Frequency Modulation Cues for Blind Audio Source Separation
We present Vibrato Nonnegative Tensor Factorization, an algorithm for single-channel unsupervised audio source separation with an application to separating instrumental or vocal sources with nonstationary pitch from music recordings. Our approach extends Nonnegative Matrix Factorization for audio modeling by including local estimates of frequency modulation as cues in the separation. This permi...
متن کاملA scheme for racquet sports video analysis with the combination of audio-visual information
As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is...
متن کامل